---
title: Assemble unstructured custom models
description: Unstructured models can use arbitrary data for input and output, allowing you to deploy and monitor models regardless of the target type.

---

# Assemble unstructured custom models

If your custom model doesn't use a target type supported by DataRobot, you can create an unstructured model. Unstructured models can use arbitrary (_i.e., unstructured_) data for input and output, allowing you to deploy and monitor models regardless of the target type. This characteristic of unstructured models gives you more control over how you read the data from a prediction request and response; however, it requires precise coding to assemble correctly. You must implement [custom hooks to process the unstructured input data](#unstructured-custom-model-hooks) and generate a valid response. 

{% include 'includes/structured-vs-unstructured-cus-models.md' %}

Inference models support unstructured mode, where input and output are not verified and can be almost anything. This is your responsibility to verify correctness. For assembly instructions specific to unstructured custom inference models, reference the model templates for <a target="_blank" href="https://github.com/datarobot/datarobot-user-models/tree/master/model_templates/python3_unstructured">Python</a>  and <a target="_blank" href="https://github.com/datarobot/datarobot-user-models/tree/master/model_templates/r_unstructured">R</a> provided in the DRUM documentation.

!!! note "Data format"
    When working with unstructured models DataRobot supports data as a text or binary file.

## Unstructured custom model hooks {: #unstructured-custom-model-hooks }

Include any necessary hooks in a file called `custom.py` for Python models or `custom.R` for R models alongside your model artifacts in your model folder:

??? note "Type annotations in hook signatures"
    The following hook signatures are written with Python 3 type annotations. The Python types match the following R types:

    Python type         | R type       | Description
    --------------------|--------------|------------
    `None`              | `NULL`       | Nothing
    `str`               | `character`  | String
    `bytes`             | `raw`        | Raw bytes
    `dict`              | `list`       | A list of key/value pairs.
    `tuple`             | `list`       | A list of data.
    `Any`               | An R object  | The deserialized model.
    `*args`, `**kwargs` | `...`        | These are keyword arguments, not types; they serve as placeholders for additional parameters.


**************************************************


### `init()` {: #init }

The `init` hook is executed only once at the beginning of the run to allow the model to load libraries and additional files for use in other hooks.

``` py
init(**kwargs) -> None
```

#### `init()` input {: #init-input }

Input parameter | Description
----------------|------------
`**kwargs`      | An additional keyword argument. `code_dir` provides a link, passed through the `--code_dir` parameter, to the folder where the model code is stored.


#### `init()` example {: #init-example }

=== "Python"

    ``` py
    def init(code_dir):
        global g_code_dir
        g_code_dir = code_dir
    ```

=== "R"

    ``` r
    init <- function(...) {
        library(brnn)
        library(glmnet)
    }
    ```

#### `init()` output {: #init-output }

The `init()` hook does not return anything.


**************************************************


### `load_model()` {: #load-model }

The `load_model()` hook is executed only once at the beginning of the run to load one or more trained objects from multiple artifacts. It is only required when a trained object is stored in an artifact that uses an unsupported format or when multiple artifacts are used. The `load_model()` hook is not required when there is a single artifact in one of the supported formats:

* Python: `.pkl`, `.pth`, `.h5`, `.joblib`
* Java: `.mojo`
* R: `.rds`

``` py
load_model(code_dir: str) -> Any
```

#### `load_model()` input {: #load-model-input }

Input parameter | Description
----------------|------------
`code_dir`      | A link, passed through the `--code_dir` parameter, to the directory where the model artifact and additional code are provided.


#### `load_model()` example {: #load-model-example }

=== "Python"

    ``` py
    def load_model(code_dir):
        model_path = "model.pkl"
        model = joblib.load(os.path.join(code_dir, model_path))
    ```


=== "R"

    ``` r
    load_model <- function(input_dir) {
        readRDS(file.path(input_dir, "model_name.rds"))
    }
    ```

#### `load_model()` output {: #load-model-output }

The `load_model()` hook returns a trained object (of any type).

**************************************************

### `score_unstructured()` {: #score }

The `score_unstructured()` hook defines the output of a custom estimator and returns predictions on input data. Do not use this hook for transform models.

``` py
score_unstructured(model: Any, data: str/bytes, **kwargs: Dict[str, Any]) -> str/bytes [, Dict[str, str]]
```

#### `score_unstructured()` input {: #score-input }

Input parameter | Description
----------------|------------
`data`          | Data represented as `str` or `bytes`, depending on the provided `mimetype`.
`model`         | A trained object loaded from the artifact by DataRobot or loaded through the `load_model` hook.
`**kwargs`      | Additional keyword arguments. For a binary classification model, it contains the positive and negative class labels as the following keys:<ul><li>`mimetype: str`: Indicates the nature and format of the data, taken from request `Content-Type` header or `--content-type` CLI argument in batch mode.</li><li>`charset: str`: Indicates the encoding for text data, taken from request `Content-Type` header or `--content-type` CLI argument in batch mode.</li><li>`query: dict`: Parameters passed as query params in a http request or the `--query` CLI argument in batch mode.</li><li>`headers: dict`: Request headers passed in http request.</li></ul>


#### `score_unstructured()` examples {: #score-unstructured-examples }

=== "Python"

    ``` py
    def score_unstructured(model, data, query, **kwargs):
        text_data = data.decode("utf8") if isinstance(data, bytes) else data
        text_data = text_data.strip()
        words_count = model.predict(text_data)
        return str(words_count)
    ```

=== "R"

    ``` r
    score_unstructured <- function(model, data, query, ...) {
        kwargs <- list(...)

        if (is.raw(data)) {
            data_text <- stri_conv(data, "utf8")
        } else {
            data_text <- data
        }
        count <- str_count(data_text, " ") + 1
        ret = toString(count)
        ret
    }
    ```

#### `score_unstructured()` output {: #score-unstructured-output }

The `score_unstructured()` hook should return:

* A single value `return data: str/bytes`.
* A tuple `return data: str/bytes, kwargs: dict[str, str]` where `kwargs = {"mimetype": "users/mimetype", "charset": "users/charset"}` can be used to return `mimetype` and `charset` for the `Content-Type` response header.


**************************************************


## Unstructured model considerations {: #unstructured-model-considerations }

### Incoming data type resolution {: #incoming-data-type-resolution }

The `score_unstructured` hook receives a `data` parameter, which can be of either `str` or `bytes` type.

You can use type-checking methods to verify types:

* Python: `isinstance(data, str)` or `isinstance(data, bytes)`

* R:  `is.character(data)` or `is.raw(data)`

DataRobot uses the `Content-Type` header to determine a type to cast `data` to. The `Content-Type` header can be provided in a request or in `--content-type` CLI argument.  
The `Content-Type` header format is `type/subtype;parameter` (e.g., `text/plain;charset=utf8`). The following rules apply:

* If `charset` is not defined, default `utf8` charset is used, otherwise provided charset is used to decode data.

* If `Content-Type` is not defined, then incoming `kwargs={"mimetype": "text/plain", "charset":"utf8"}`, so data is treated as text, decoded using `utf8` charset and passed as `str`.

* If `mimetype` starts with `text/` or `application/json`, data is treated as text, decoded using provided charset and passed as `str`.

* For all other `mimetype` values, data is treated as binary and passed as `bytes`.

### Outgoing data and kwargs parameters {: #outgoing-data-and-kwargs-parameters }

As mentioned above, `score_unstructured` can return:

* A single data value: `return data`.

* A tuple (data and additional parameters: `return data, {"mimetype": "some/type", "charset": "some_charset"}`).

#### Server mode {: #server-mode }

In server mode, the following rules apply:

* `return data: str`: The data is treated as text, the default `Content-Type="text/plain;charset=utf8"` header is set in response, and data is encoded and sent using the `utf8` `charset`.

* `return data: bytes`: The data is treated as binary, the default `Content-Type="application/octet-stream;charset=utf8"` header is set in response, and data is sent as-is.

* `return data, kwargs`: If `mimetype` value is missing in `kwargs`, the default `mimetype` is set according to the data type `str`/`bytes` -> `text/plain`/`application/octet-stream`. If `charset` value is missing, the default `utf8` charset is set; then, if the data is of type `str`, it will be encoded using resolved `charset` and sent.

#### Batch mode {: #batch-mode }

The best way to debug in batch mode is to provide `--output` file. The returned data is written to a file according to the type of data returned:

* `str` data is written to a text file using default `utf8` or returned in `kwargs` `charset`.

* `bytes` data is written to a binary file. The returned `kwargs` are not shown in batch mode, but you can still print them during debugging.

### Auxiliaries {: #auxiliaries }

You may use the `datarobot_drum.RuntimeParameters` in your code (e.g. `custom.py`) to read runtime parameters delivered to the executed custom model. The runtime parameters should be defined in the DataRobot UI. Below is a simple example of how to read a string of credential runtime parameters:

``` py
from datarobot_drum import RuntimeParameters

def load_model(code_dir):
    target_url = RuntimeParameters.get("TARGET_URL")
    s3_creds = RuntimeParameters.get("AWS_CREDENIAL")
    ...
```
